{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 4. Isomorphism\n",
    "\n",
    "(c) 2019, Dr. Ramil Nugmanov; Dr. Timur Madzhidov; Ravil Mukhametgaleev\n",
    "\n",
    "Installation instructions of CGRtools package information and tutorial's files see on `https://github.com/cimm-kzn/CGRtools`\n",
    "\n",
    "NOTE: Tutorial should be performed sequentially from the start. Random cell running will lead to unexpected results. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pkg_resources\n",
    "if pkg_resources.get_distribution('CGRtools').version.split('.')[:2] != ['3', '1']:\n",
    "    print('WARNING. Tutorial was tested on 3.1 version of CGRtools')\n",
    "else:\n",
    "    print('Welcome!')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# load data for tutorial\n",
    "from pickle import load\n",
    "from traceback import format_exc\n",
    "\n",
    "with open('molecules.dat', 'rb') as f:\n",
    "    molecules = load(f) # list of MoleculeContainer objects\n",
    "with open('reactions.dat', 'rb') as f:\n",
    "    reactions = load(f) # list of ReactionContainer objects\n",
    "\n",
    "m2, m3 = molecules[1:3] # molecule\n",
    "m7 = m3.copy()\n",
    "m7.standardize()\n",
    "r1 = reactions[0] # reaction\n",
    "m5, m6 = r1.reactants[:2]\n",
    "m8 = m7.substructure([4, 5, 6, 7, 8, 9], as_view=False)\n",
    "m9 = m6.substructure([5, 6,7, 8], as_view=False) # acid\n",
    "m10 =  r1.products[0].copy()\n",
    "\n",
    "benzene = m3.substructure([4,5,6,7,8,9], as_view=False) \n",
    "cgr1 = m7 ^ m8 \n",
    "cgr1.reset_query_marks() \n",
    "carb = m10.substructure([5,7,8, 2])\n",
    "m2.reset_query_marks()\n",
    "\n",
    "from CGRtools.containers import *\n",
    "from CGRtools import CGRpreparer\n",
    "preparer = CGRpreparer() \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4.1. Molecules Isomorphism\n",
    "CGRtools has simple substructure/structure isomorphism API. In backend VF2 algorithm from `NetworkX` library is used. \n",
    "\n",
    "Note, that atoms are matched in subgraph isomorphism only if they have same charge/multiplicity and isotope options."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "m7"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "m8"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "benzene.standardize()\n",
    "benzene"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# isomorphism operations\n",
    "print(benzene < m7)  # benzene is substructure of m7\n",
    "print(benzene > m7)  # benzene is not superstructure of m7\n",
    "print(benzene <= m7) # benzene is substructure/or same structure of m7\n",
    "print(benzene >= m7) # benzene is not superstructure/or same structure of m7\n",
    "print(benzene < m8) # benzene is not substructure of m8. it's equal\n",
    "print(benzene <= m8)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "m5"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "m6"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Mappings of substructure to structure can be returned using substructure.get_substructure_mapping(structure, limit=1) method. Argument limit is the number of mappings that one wants to be returned, limit=0 means to return all possible mappings. Method acts as generator.\n",
    "\n",
    "To get mapping upon structure search structure1.get_mapping(structure2) method was developed. It returns only one possible mapping of all atoms for two isomorphic molecules. This functionality was developed to reorder atoms of two MoleculeContainers in the same order (the dictionary that is given by this method could be directly fed to remap function, see above) for some reaction handling issues. If molecules are isomorphic it works faster than get_substructure_mapping.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "m5.get_substructure_mapping(m6)  # mapping of m5 substructure into m2 superstructure"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "for m in m5.get_substructure_mapping(m6, limit=0):  # iterate over all possible substructure mappings\n",
    "    print(m)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "benzene.get_mapping(m8)  # mapping of benzene into m8 - also benzene."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4.2. Reactions\n",
    "ReactionContainers do not support isomorphism due to ambiguity. But molecules in reaction can be matched.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "try:            # it is not possible to match molecule and reaction. Error is returned\n",
    "    m6 < r1\n",
    "except TypeError:\n",
    "    print(format_exc())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "r1.products[0] # see structure in products"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "m6 # substructure used. One can see, they should not match"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "any(m6 < m for m in r1.products) # check if any molecule from product side has m6 as substructure"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4.3 CGR\n",
    "Substructure search is possible with CGRContainer. API is the same as for molecules.\n",
    "\n",
    "Matching CGR into CGR and molecule into CGR is possible. **Note that only conventional bonds in CGR could match moleculear bonds.** \n",
    "\n",
    "Equal atoms in isomorphism is atoms with same charge/multiplicity and isotope numbers in reactant and product states\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "decomposed1 = preparer.decompose(cgr1) # let's have a look at reaction corresponding to cgr1\n",
    "decomposed1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "m8 # this's the substructure we are looking for"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "m8 < cgr1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "cgr1 <= cgr1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4.4 Queries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# to use QueryContainers neighbors and hybridization for molecules need to be calculated\n",
    "m9.reset_query_marks()\n",
    "m10.reset_query_marks()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "m9 # acid"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "m10 # ether"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "carb"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print('m9:', f'{m9:hn}') # all labels were calculated\n",
    "print('m10:', f'{m10:hn}')\n",
    "print('carb:', f'{carb:hn}') # notice that one of oxygen atom has 2 neighbors. Only ester could fit this restriction."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Molecules isomorphism don't take into account neighbors and hybridization"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "carb < m9 # carb currently is molecule projection. It fit this molecule as well."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "carb < m10 # carb is a substructure of m10"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "One need to convert molecule (or it's projection) into QueryContainer object. In this case number of neighbors and hybridization data will be taken into account upon substructure search.\n",
    "\n",
    "API of isomorphism is the same."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "q = QueryContainer(carb)  # convert molecule into query\n",
    "print(q)     # now one can see that in signature of QueryContainer. See that one of oxygen has 2 neighbors."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "q < m9 # now neighbors and hybridization are taken into account."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Acid m9 has hydroxyl group with one non-hydrogen neighbor. Our query requires existence of one oxygen atom with two non-hydrogen neighbors."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "q < m10 # ester matches to query."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "m2.reset_query_marks()\n",
    "m2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "q < m2 # this molecule does q as substructure as well. It is acid."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "cgrtools",
   "language": "python",
   "name": "cgrtools"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
